Representing and Accessing Multi-Level Annotations in MMAX2
نویسنده
چکیده
MMAX21 is a versatile, XML-based annotation tool which has already been used in a variety of annotation projects. It is also the tool of choice in the ongoing project DIANA-Summ, which deals with anaphora resolution and its application to spoken dialog summarization. The project uses the ICSI Meeting Corpus (Janin et al., 2003), a corpus of multi-party dialogs which contains a considerable amount of simultaneous speech. It features a semiautomatically generated segmentation in which the corpus developers tried to track the flow of the dialog by inserting segment starts approximately whenever a person started talking. As a result, the corpus has some interesting structural properties, most notably overlap, that are challenging for an XML-based representation format. The following brief overview of MMAX2 focuses on this aspect, using examples from the ICSI Meeting Corpus.
منابع مشابه
Multi-level annotation of linguistic data with MMAX2
This paper describes how richly annotated corpora can be created with the annotation tool MMAX2. The description is from the point of view of Computational Linguistics, a discipline where annotated corpora are often used as resources for software development. The paper outlines the important steps in the life cycle of an annotation and details how the tool MMAX2 can be employed in each of them.
متن کاملEXCOTATE: An Add-on to MMAX2 for Inspection and Exchange of Annotated Data
In this paper, we present an add-on called EXCOTATE for the annotation tool MMAX2. The addon interacts with annotated data stored in and spread over different MMAX2 projects. The data can be inspected, revised, and analyzed in a tabular format, and will be reintegrated into MMAX2 projects afterwards. It is based on Microsoft Excel with extensive usage of the script language Visual Basic for App...
متن کاملRepresenting and Accessing Multilevel Linguistic Annotation using the MEANING Format
We present an XML annotation format (MEANING Annotation Format, MAF) specifically designed to represent and integrate different levels of linguistic annotations and a tool that provides flexible access to them (MEANING Browser). We describe our experience in integrating linguistic annotations coming from different sources, and the solutions we adopted to implement efficient access to corpora an...
متن کاملRepresenting Multimodal Linguistics Annotated data
The question of interoperability for linguistic annotated resources requires to cover different aspects. First, it requires a representation framework making it possible to compare, and potentially merge, different annotation schema. In this paper, a general description level representing the multimodal linguistic annotations is proposed. It focuses on time and data content representation: This...
متن کاملUsing Semantic Metadata for Discovery and Integration of Heterogeneous Ecological Data
Effective discovery and integration of ecological data within data management systems requires rich semantic information that can describe and relate the types of information contained within disparate data sets. Within the Semtools project, we have developed approaches for expressing and representing semantic annotations of data sets for supplementing attribute and data-level metadata with ter...
متن کامل